-
Notifications
You must be signed in to change notification settings - Fork 768
SOLR-7632 TikaServer as pluggable backend to existing extraction handler #3670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ika API Refactor some tests to LocalTikaExtractionBackendTest
Exciting! |
Status:
TBD:
Anyone, please feel free to hack away on this if it looks exciting, committing directly to the PR branch. Question: Would it bring value to isolate the refactoring in one PR and then another one to add the tikaserver impl? |
Cleanup TestContainer Refactor ExtractionMetadata Add returnType to ExtractionRequest Remove static initializers
cc3d43f
to
a3794ce
Compare
Any luck with security manager?? I had many difficulties |
Testcontainers and docker don't love the SecurityManager. I had claude try to run the tests and add additional permissions to
|
Yea, that’s annoying. Perhaps we could disable JSM for this test or for tests in the entire module? |
I had the similar experience as I was upgrading kafka. And then I stopped. |
Java Security Manager and Testcontainers do not play nicely together. We prefer Testcontainers, so disable JSM
When I first saw |
Add common metadata Adjust some tests with dc:title instead of title Support passwords in TikaServer backend
solr/modules/extraction/src/test-files/extraction/solr/collection1/conf/solrconfig.xml
Show resolved
Hide resolved
* @deprecated Will be replaced with something similar that calls out to a separate Tika Server | ||
* process running in its own JVM. | ||
*/ | ||
@Deprecated(since = "9.10.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epugh I undeprecated this and the Loader, and instead deprecated the Local backend. This part needs to be backported before 9.10 release. Also perhaps wording in major-changes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally!
Add some thread names to filter
Validate path of tikaConfigLoc
Properly close resources
Cleaner usage of reader
https://issues.apache.org/jira/browse/SOLR-7632
This work builds on the one in #3361 but instead of making a new module, we add it as a capability to the existing extraction handler through specifying
extraction.backend=tikaserver
.This first required refactoring extraction handler to detach it from the Tika-v1 API. There is a new interface
ExtractionBackend
that takes genericExtractionRequest
object in and returns anExtractionResult
bean, and a newLocalTikaExtractionBackend
implementation that encapsulates all Tikav1 api handling. This implementation can be deprecated, and in Solr 10, thetikaserver
one can be made default.All existing tests pass, and most of the existing extraction tests now also pass when running the
tikaserver
backend (running in TestContainers). Unfortunately docker is not available in Crave, so a new GH workflow is made to run only the extraction tests.TODO's: